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stored  on  a  file  system  with  the  aim  of  enticing  a  malicious  insider  to  open  and  review  the  contents  of  the  documents.  The  decoy  documents 
contain  several  different  types  of  bogus  credentials  that  when  used,  trigger  an  alert.  We  also  embed  “stealthy  beacons”  inside  the  documents 
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honeypots  penetrated  by  attackers  demonstrating  the  feasibility  of  the  method. 
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Abstract 

The  insider  threat  remains  one  of  the  most  vexing  problems  in  computer  security.  A  number 
of  approaches  have  been  proposed  to  detect  nefarious  insider  actions  including  user  modeling 
and  profiling  techniques,  policy  and  access  enforcement  techniques,  and  misuse  detection.  In 
this  work  we  propose  trap-based  defense  mechanisms  for  the  case  where  insiders  attempt  to 
exfiltrate  and  use  sensitive  information.  Our  goal  is  to  confuse  and  confound  the  attacker 
requiring  far  more  effort  to  identify  real  information  from  bogus  information  and  to  provide  a 
means  of  detecting  when  an  inside  attacker  attempts  to  exploit  sensitive  information.  “Decoy 
Documents”  are  automatically  generated  and  stored  on  a  file  system  with  the  aim  of  enticing 
a  malicious  insider  to  open  and  review  the  contents  of  the  documents.  The  decoy  documents 
contain  several  different  types  of  bogus  credentials  that  when  used,  trigger  an  alert.  We  also 
embed  “stealthy  beacons”  inside  the  documents  that  cause  a  signal  to  be  emitted  to  a  server 
indicating  when  and  where  the  particular  decoy  was  opened.  We  evaluate  decoy  documents  on 
honeypots  penetrated  by  attackers  demonstrating  the  feasibility  of  the  method. 


1  Introduction 

Much  research  in  computer  security  has  focused  on  the  means  of  preventing  unauthorized  and  ille¬ 
gitimate  access  to  systems  and  information.  Unfortunately,  the  most  damaging  malicious  activity 
is  the  result  of  internal  misuse  within  an  organization,  perhaps  since  far  less  attention  has  been 
focused  inward.  Despite  classic  internal  operating  system  security  mechanisms  and  the  body  of 
work  on  formal  specification  of  security  and  access  control  policies,  including  Bell-LaPadula  [1] 
and  the  Clark-Wilson  models  [4],  we  still  have  an  extensive  insider  attack  problem.  Indeed  in  many 
cases,  formal  security  policies  arc  incomplete  and  implicit  or  they  arc  purposely  ignored  in  order 
to  get  business  goals  accomplished.  There  seems  to  be  little  technology  available  to  address  the 
insider  threat  problem.  Insider  attack  has  overtaken  viruses  and  worm  attacks  as  the  most  reported 
security  incident  according  to  a  report  from  the  US  Computer  Security  Institute  (CSI)  [20].  The 
annual  Computer  Crime  and  Security  Survey  for  2007  surveyed  494  security  personnel  members 
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from  US  corporations  and  government  agencies,  finding  that  insider  incidents  were  cited  by  59 
percent  of  respondents,  while  only  52  percent  said  they  had  encountered  a  conventional  virus  in 
the  previous  year.  The  state-of-the-art  seems  to  be  still  driven  by  forensics  analysis  after  an  attack, 
rather  than  technologies  that  prevent,  detect,  and  deter  insider  attack. 

We  define  insider  threats  by  differentiating  between  Masqueraders  (attackers  who  impersonate 
another  inside  user)  and  Traitors  (an  inside  attacker  using  their  own  legitimate  credentials).  One 
possible  solution  for  masquerade  detection  involves  anomaly  detection  [19].  In  this  approach,  users 
actions  are  profiled  to  form  a  baseline  of  normal  behavior.  Subsequent  monitoring  for  abnormal 
behaviors  that  exhibit  large  deviations  from  this  baseline  [17]  signal  a  potential  insider  attack.  The 
common  strategy  to  prevent  inside  attacks  involves  policy-based  access  control  techniques  to  limit 
the  scope  of  systems  and  information  an  insider  is  authorized  to  use,  and  hence,  limit  the  damage 
the  organization  may  incur  when  an  insider  goes  awry.  Prevention  techniques  may  not  always 
succeed,  and  thus,  monitoring  and  detection  techniques  are  needed  when  prevention  fails.  In  this 
paper,  we  arc  focused  on  different  techniques  aimed  at  detecting  masqueraders  and  traitors. 

We  note  that  some  external  attackers  can  become  insiders  when  an  outsider  attains  internal 
network  access.  Many  attacks  use  spyware  and  rootkits  [3],  which  give  outsiders  internal  access. 
Such  software  can  easily  be  installed  on  systems  from  physical  or  digital  media  (e.g.,  email,  down¬ 
loads,  etc.)  and  allow  an  attacker  administrator  or  “root”  access  on  a  machine  along  with  a  means 
to  gather  sensitive  data.  Rootkits  have  the  ability  to  conceal  themselves  and  elude  detection,  es¬ 
pecially  when  the  rootkit  is  previously  unknown,  as  is  true  in  zero-day  attacks  [8],  An  external 
attacker  that  manages  to  install  rootkits  internally  in  effect  becomes  an  insider,  thereby  multiplying 
the  ability  to  inflict  harm.  Although  the  techniques  described  in  this  paper  may  have  utility  for 
these  cases,  in  this  paper  our  primary  focus  is  on  human  insiders  attempting  to  exfiltrate  sensitive 
information.  By  exfiltration  we  mean  unauthorized  copying  and  transmission  of  information  by 
any  means  including  human  memory. 

The  insider  attack  defense  system  described  in  this  paper  is  of  an  offensive  nature,  intended 
to  confuse  and  deceive  a  traitor  by  leveraging  uncertainty,  to  reduce  the  knowledge  they  ordinarily 
have  of  the  systems  and  data  they  might  be  authorized  to  use.  This  work  considers  methods  to  detect 
insider  attack  against  enterprise  systems  as  well  as  individual  hosts  and  laptops.  We  introduce  a 
deception  system  to  distribute  potentially  large  amounts  of  decoy  information  with  the  aim  to  detect 
nefarious  acts  as  well  as  to  increase  the  workload  of  an  attacker  to  identify  real  information  from 
bogus  information,  rather  than  providing  unfettered  access  as  broadly  exists  today.  We  developed 
a  system  to  generate  and  place  decoy  documents  within  a  file  system.  Our  system  generates  decoy 
documents  containing  decoy  credentials  that  are  monitored  (e.g.,  Gmail  credential  monitoring) 
for  misuse  and  stealthily  embedded  beacons  that  signal  an  alert  when  the  document  is  opened. 
Beacons  are  embedded  in  documents  using  methods  of  deception  and  obfuscation  gleaned  from 
studying  malcode  embedded  inside  documents  as  seen  in  the  wild  [16];  we  thus  turn  the  tables  on 
attackers. 

To  achieve  the  goal  of  wide  spread  deception  we  must  consider  methods  to  trap  a  wide  variety 
of  potential  insiders  with  varying  levels  of  sophistication.  Toward  this  goal,  we  developed  a  proof- 
of-concept  system  we  call  D3,  the  Decoy  Document  Distributor  system.  Samples  of  D3  generated 
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documents  are  presented  in  the  Appendix.  The  contributions  of  this  paper  include: 

•  A  large-scale  automated  creation  and  management  system  for  deploying  decoys  that  can 
detect  the  presence  (and,  in  some  cases,  “identity”)  of  malicious  insiders,  or  at  least  indicate 
malicious  insider  activity. 

•  An  offensive  trap-based  defense  system  is  proposed  to  detect  masqueraders  and  traitors,  and 
to  flood  attackers  with  bogus  exfiltrated  information  that  they  must  analyze  in  order  to  find 
real  information  of  value.  Hence,  our  long  term  goal  is  to  flood  the  miscreant  marketplace 
with  bogus  information  devaluing  their  quarry. 

•  A  set  of  properties  arc  proposed  to  guide  the  design  of  decoys  and  a  system  to  automatically 
generate  large  quantities  of  decoy  information  that  considers  the  level  of  sophistication  of 
the  inside  attacker. 

•  A  design  of  decoy  information  that  combines  a  number  of  methods  and  monitors,  both  inter¬ 
nal  and  external,  to  detect  insider  exploitation  using  a  common  and  ubiquitous  set  of  baited 
targets,  ordinary  looking  documents. 

-  A  watermark  is  embedded  in  the  binary  format  of  the  document  file  to  detect  when  the 
decoy  is  loaded  in  memory,  or  egressed  in  the  open  over  a  network. 

-  A  “beacon”  is  embedded  in  the  decoy  document  that  signals  a  remote  website  upon 
opening  of  the  document  indicating  the  malfeasance  of  an  insider  illicitly  reading  bait 
information. 

-  If  these  methods  fail  to  detect  an  insider  attack  or  an  exfiltration  of  baited  documents, 
the  content  of  the  documents  contain  bait  and  decoy  information  that  is  monitored 
as  well.  Bogus  logins  at  multiple  organizations  as  well  as  bogus  and  realistic  bank 
information  is  monitored  by  external  means. 

•  An  easy  to  use  system  to  broadly  deploy  decoys  to  ordinary  users  who  arc  alerted  by  email 
when  a  decoy  has  been  touched  on  their  laptops  and  personal  computers;  no  such  system 
presently  exists. 

The  reader  is  encouraged  to  visit  the  Decoy  Document  Distribution  (D3)  website  to  evaluate 
our  technology  developed  to  date  at:  http://www.cs.columbia.edu/ids/RUU/Dcubed1. 

2  Related  Work 

The  use  of  deception,  or  decoys,  plays  a  valuable  role  in  the  protection  of  systems,  networks,  and 
information.  The  first  use  of  decoys  (i.e.,  in  the  cyber  domain)  has  been  credited  to  Cliff  Stoll 

'Some  features  are  restricted  for  internal  use  only. 
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[26,  24]  and  detailed  in  his  novel  “The  Cuckoos  Egg”  [25],  where  he  provides  a  thorough  account 
of  his  crusade  to  catch  German  hackers  breaking  into  Lawrence  Berkley  Laboratory  computer 
systems.  Stoll’s  methods  included  the  use  of  bogus  networks,  systems,  and  documents  to  gather 
intelligence  on  the  German  attackers  who  were  apparently  seeking  state  secrets.  Among  the  many 
techniques  waged,  he  crafted  “bait”  files,  or  in  his  case,  bogus  classified  documents  that  really 
contained  non-sensitive  government  information  and  attached  “alarms”  to  them  so  that  he  would 
know  if  anyone  accessed  at  them.  To  Stoll’s  credit,  a  German  hacker  was  eventually  caught  and  it 
was  found  that  he  had  been  selling  secrets  to  the  KGB. 

Deception-based  information  resources  that  have  no  production  value  other  than  to  attract  and 
detect  adversaries  (like  those  used  by  Stoll)  are  commonly  known  as  Honeypots  [11].  Honeypots 
serve  as  effective  tools  for  profiling  attacker  behavior  and  to  gather  intelligence  to  understand 
how  attackers  operate.  Honeypots  arc  considered  to  have  low  false  positive  rates  since  they  are 
designed  to  capture  only  malicious  attackers,  except  for  perhaps  an  occasional  mistake  by  innocent 
users.  Spitzner  described  how  honeypots  can  be  useful  for  detecting  insider  attack[23],  in  addition 
to  the  common  external  threats  for  which  they  arc  traditionally  known.  He  discusses  the  use  of 
honeytokens,  which  he  defines  as  “a  honeypot  that  is  not  a  computer”  [24],  citing  examples  that 
include  bogus  medical  records,  credit  card  numbers,  and  credentials,  with  descriptions  of  how 
they  can  be  used  to  detect  malicious  insiders  [23,  24].  In  current  systems,  the  decoy/honeytoken 
creation  is  a  laborious  and  manual  process  requiring  large  amounts  of  administrator  intervention. 
In  contrast,  we  propose  the  seeding  of  decoy  information  (of  various  different  types)  throughout  an 
operational  system.  Our  work  extends  these  basic  ideas  to  an  automated  system  of  managing  the 
creation  and  deployment  of  these  honeytokens. 

Yuill  et  al.  [26]  extend  the  notion  of  honeytokens  with  a  “honeyfile  system”  to  support  the 
creation  of  bait  files,  or  as  they  define  them,  “honeyfiles.”  The  honeyfile  system  is  implemented  as 
an  enhancement  to  the  Network  Lile  Server.  The  system  allows  for  any  file  within  user  file  space  to 
become  a  honeyfile  through  the  creation  of  a  record  associating  a  filename  to  userid.  The  honeyfile 
system  monitors  all  file  access  on  the  server  and  alerts  users  when  honeyfiles  have  been  accessed. 
Their  work  does  not  focus  on  the  content  or  automatic  creation  of  files,  but  they  do  elicit  some  of 
the  challenges  of  creating  deceptive  files  (with  respect  to  names)  that  we  address  in  section  4. 

In  this  paper,  we  introduce  a  set  of  properties  of  decoys  to  guide  their  design  and  maximize 
the  deception  they  induce  for  different  classes  of  insiders  who  vary  by  their  level  of  knowledge 
and  sophistication.  Bell  and  Whaley  [2]  have  described  the  structure  of  deception  as  a  process 
of  hiding  the  real  and  showing  showing  the  false.  They  introduce  several  methods  of  hiding  that 
include  masking,  repackaging,  and  dazzling,  along  with  three  methods  of  showing  that  include 
mimicking,  inventing,  and  decoying.  Yuill  et  al.  [27]  expand  upon  this  work  and  characterize 
deceptive  hiding  in  terms  of  how  it  defeats  an  adversary’s  discovery  process.  They  describe  an 
adversary’s  discovery  process  as  taking  three  forms:  direct  observation,  investigation  based  on 
evidence,  and  learning  from  other  people  or  agents.  Their  work  offers  a  process  model  for  creating 
deceptive  hiding  techniques  based  on  how  they  defeat  an  adversary’s  discovery  process. 

The  decoy  documents  introduced  in  this  paper  utilize  similar  deception  mechanisms  as  well 
as  beacons  to  signal  a  remote  detect  and  alert  in  real-time  time  when  a  decoy  has  been  opened. 
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Web  bugs  arc  a  form  of  silent  embedded  beacons  which  have  been  used  to  track  user  habits  of 
web  or  email.  Web  bugs  are  a  class  of  silent  embedded  tokens  which  have  been  used  to  track 
usage  habits  of  web  or  email  users  [18].  Unfortunately,  they  have  been  most  closely  associated 
with  unscrupulous  operators,  such  as  spammers,  virus  writers,  and  spyware  authors  who  have  used 
them  to  violate  users  privacy.  Typically  they  will  be  embedded  in  the  HTML  portion  of  an  email 
message  as  a  non-visible  white  on  white  image,  but  they  have  also  been  demonstrated  in  other 
forms  such  as  Microsoft  Word,  Excel,  and  PowerPoint  documents  [22].  When  rendered  as  HTML, 
a  web  bug  triggers  a  server  update  which  allows  the  sender  to  note  when  and  where  the  web  bug 
was  viewed.  Animated  images  allow  the  senders  to  monitor  how  long  the  message  was  displayed. 
The  web  bugs  operate  without  alerting  the  user  of  the  tracking  mechanisms.  The  advantage  for 
legitimate  advertisers  is  that  this  allows  them  to  monitor  advertisement  effectiveness,  while  privacy 
advocates  worry  that  this  technology  can  be  misused  to  spy  on  users’  habits.  Our  work  leverages  the 
same  ideas,  but  extends  them  to  other  document  classes  and  is  more  sophisticated  in  the  methods 
used  to  draw  attention.  In  addition,  our  targets  arc  insiders  who  should  have  no  expectation  of 
privacy  on  a  system  they  violate. 

3  Threat  Model  -  Level  of  Sophistication  of  the  Attacker 

The  insider  seeks  to  identify  and  avoid  the  decoys  and  abscond  with  “real”  information.  We  broadly 
define  four  monotonically  increasing  levels  of  insider  sophistication  and  capability.  Some  will  have 
tools  available  to  assist  in  deciding  what  is  a  decoy  and  what  is  real.  Others  will  only  have  their 
own  observations  and  thoughts. 

•  Low:  Direct  observation  is  the  only  tool  available.  The  adversary  largely  depends  on  what 
can  be  gleaned  from  a  first  glance.  We  strive  to  defeat  this  level  of  adversary  with  our 
beacon  documents,  even  though  decoys  with  embedded  beacons  may  be  distinguished  with 
more  advanced  tools. 

•  Medium:  A  more  thorough  investigation  can  be  performed  by  the  insider;  decisions  based  on 
other,  possibly  outside  evidence,  can  be  made.  For  example,  if  a  decoy  document  contains  a 
decoy  account  credential  for  a  particular  identity,  an  adversary  may  verify  that  the  particular 
identity  is  real  or  not  by  querying  an  external  system  (such  as  www.whitepages.com).  Such 
adversaries  will  require  stronger  decoy  information  possibly  corroborated  by  other  sources 
of  evidence. 

.  High:  Access  to  the  most  sophisticated  tools  are  available  to  the  attacker  (e.g.,  super  comput¬ 
ers,  other  informed  people  who  have  organizational  information).  The  notion  of  the  “Perfect 
Decoy”  described  in  the  next  section  may  be  the  only  indiscernible  decoy  by  an  adversary  of 
such  caliber. 

•  Highly  Privileged:  Probably  the  most  dangerous  of  all  is  the  privileged  and  highly  sophis¬ 
ticated  user.  Such  attackers  might  even  be  aware  that  the  system  is  baited  and  will  employ 
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sophisticated  tools  to  try  to  analyze,  disable,  and  avoid  decoys  entirely.  Trapping  this  class 
of  attacker  is  the  most  difficult  and  beyond  the  scope  of  this  paper. 


4  Generating  and  Distributing  Bait 

In  order  to  create  decoys  to  bait  various  levels  of  insiders,  one  must  understand  the  core  properties 
of  a  decoy  that  will  successfully  bait  an  insider. 

4.1  Decoy  Properties 

We  enumerate  various  properties  and  means  of  measuring  these  properties  that  are  associated  with 
decoy  documents  to  ensure  their  use  will  be  likely  to  snare  an  inside  attacker. 

•  Believable2:  Capable  of  eliciting  belief  or  trust;  capable  of  being  believed;  appearing 
true;  seeming  to  be  true  or  authentic. 

A  good  decoy  should  make  it  difficult  for  an  adversary  to  discern  whether  they  arc  looking  at  an 
authentic  document  from  a  legitimate  source  or  if  they  arc  indeed  looking  at  a  decoy.  We  conjecture 
that  believability  of  any  particular  decoy  can  be  measured  through  experiment.  We  define  a  decoy 
believability  experiment  as  follows: 

•  Choose  two  documents  such  that  one  is  the  decoy  we  wish  to  measure  the  believability  of 
and  the  second  is  chosen  at  random  from  a  pool  of  authentic  documents. 

•  Select  a  volunteer  at  random  to  participate  in  a  user  study. 

•  The  volunteer  is  given  access  to  the  documents  chosen  in  step  one  and  tasked  to  decide  which 
of  the  two  is  authentic. 

For  concreteness,  we  build  upon  the  definition  of  “Perfect  Secrecy”  proposed  in  the  cryptog¬ 
raphy  community  [13]  and  define  a  “perfect  decoy”  to  be  a  decoy  that  is  chosen  in  a  believability 
experiment  with  a  probability  of  1/2  (the  outcome  that  would  be  achieved  if  the  volunteer  decided 
completely  at  random).  That  is,  a  perfect  decoy  is  one  that  is  completely  indistinguishable  from 
one  that  is  not.  A  benefit  of  this  definition  is  that  the  challenge  of  showing  a  decoy  to  be  believable, 
or  not,  reduces  to  the  problem  of  creating  a  “distinguisher”  that  can  decide  with  probability  better 
than  1/2. 

In  practice,  the  construction  of  a  “perfect  decoy”  might  be  unachievable,  especially  through 
automatic  means,  but  the  notion  remains  important  as  it  provides  a  goal  to  strive  for  in  our  design 
and  implementation  of  systems.  For  many  threat  models,  it  might  suffice  to  have  less  than  perfect 
believable  decoys.  For  our  proof-of-concept  system  described  below,  we  generate  receipts  and 

2For  clarity,  each  property  is  provided  with  its  definition  gleaned  from  online  dictionary  sources. 


6 


tax  documents,  and  other  common  form-based  documents  with  decoy  credentials,  realistic  names, 
addresses  and  logins,  all  information  that  is  familial-  to  all  users. 

We  note  that  the  believable  property  of  a  decoy  may  be  less  important  than  other  properties 
defined  below  since  the  attacker  may  have  to  open  the  decoy  in  order  to  decide  whether  the  docu¬ 
ment  is  real  or  not.  The  act  of  opening  the  document  may  be  all  that  we  need  to  trap  the  insider, 
irrespective  of  the  believability  of  its  content.  Hence,  enticing  an  attacker  to  open  a  document  may 
be  a  more  effective  defense  strategy. 

•  Enticing:  highly  attractive  and  able  to  arouse  hope  or  desire;  “an  alluring  prospect”; 
lure. 

Herein  lies  the  issue  of  how  does  one  measure  the  extent  to  which  a  decoy  arouses  desires,  how 
well  is  it  a  lure?  One  obvious  way  is  to  create  decoys  containing  information  with  monetary  value, 
such  as  passwords  or  credit  card  numbers  that  have  black  market  value  [15]. 

However,  enticement  depends  upon  the  attacker’s  intent.  Hence,  we  posit  that  by  defining 
several  general  categories  of  “things”  that  are  of  “attacker  interest”,  one  may  compose  decoys 
using  terms  or  words  that  correspond  to  desires  of  the  attacker  that  are  overwhelmingly  enticing. 
For  example,  if  the  attacker  desires  money,  any  document  that  mentions  or  describes  information 
that  provides  access  to  money  should  be  highly  enticing.  We  believe  we  can  measure  frequently 
occurring  (search)  terms  associated  with  major  categories  of  interest  and  use  these  as  the  constituent 
words  in  decoy  documents.  To  measure  the  effectiveness  of  this  generative  strategy,  it  should  be 
possible  to  execute  content  searches  and  count  the  number  of  times  decoys  appeal-  in  the  top  10 
list  of  displayed  documents.  This  is  a  reasonable  approach  also,  to  measuring  how  conspicuous, 
defined  below,  the  decoys  become  based  upon  the  attacker’s  searches  associated  with  their  interest 
and  intent. 

•  Conspicuous:  easily  visible;  easily  or  clearly  visible;  obvious  to  the  eye  or  mind;  At¬ 
tracting  attention. 

Here,  a  conspicuous  decoy  should  be  easily  found  or  observed.  When  a  user  first  logs  in,  a 
conspicuous  decoy  should  either  be  in  full  view  on  the  desktop,  or  viewable  after  one  (targeted) 
search  action.  One  simple  user  action  is  optimal  for  a  highly  conspicuous  decoy.  Thus,  a  measure 
of  conspicuousness  may  be  a  count  of  the  number  of  search  actions  needed,  on  average,  for  a  decoy 
to  appeal-  in  full  view.  The  decoy  may  be  stored  in  the  tile  system  anywhere  if  a  simple  content- 
based  search  locates  it  in  one  step.  But,  this  search  act  depends  upon  the  query  executed  by  the  user. 
The  query  can  either  be  a  location  (eg.,  search  for  a  directory  named  “TAX”  in  which  the  decoy 
appeal's)  or  a  content  query  (eg.,  using  Google  Desktop  Search  for  documents  containing  the  word 
“TAX.”)  In  either  case,  if  a  decoy  document  appeal's  after  one  such  search,  it  is  conspicuous.  But, 
this  depends  upon  what  search  terms  the  attacker  uses  to  query!  If  the  decoy  never  appeal's  because 
the  attacker  used  the  wrong  search  terms,  the  decoy  is  not  conspicuous.  We  posit  that  the  property 
of  enticing  is  likely  the  most  important  property,  and  a  formal  measure  to  evaluate  enticement  will 
generate  better  decoys.  In  summary,  an  enticing  decoy  should  be  conspicuous  to  be  an  effective 
decoy  trap. 
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•  Detectable;  to  discover  or  catch  (a  person)  in  the  performance  of  some  act:  to  detect 
someone  cheating. 

We  designed  the  decoy  documents  with  several  techniques  to  provide  a  good  chance  of  detect¬ 
ing  the  malfeasance  of  an  inside  attack  in  real-time. 

•  At  time  of  application  start-up.  the  decoy  document  emits  a  beacon  alert  to  a  remote  server. 

•  At  the  time  of  memory  load,  a  host-sensor,  such  as  an  AV  scanner,  may  detect  embedded 
tokens  placed  in  a  clandestine  location  of  the  document  file  format. 

•  At  the  time  of  exfiltration,  a  NIDS  such  as  Snort  may  be  used  to  detect  these  embedded 
tokens  during  the  egress  of  the  decoy  document  in  network  traffic  where  possible. 

•  At  time  of  information  exploitation  and/or  credential  misuse,  monitoring  of  decoy  logins 
and  other  credentials  embedded  in  the  document  content  by  external  systems  will  generate 
an  alert  that  is  correlated  with  the  decoy  document  in  which  the  credential  was  placed. 

This  extensive  set  of  monitors  forces  the  attacker  to  expend  considerable  effort  to  avoid  detec¬ 
tion,  and  hopefully  will  serve  as  a  deterrent  to  reduce  internal  malfeasance  within  organizations 
that  deploy  such  a  trap-based  defense.  In  the  proof-of-concept  implementation  reported  in  this 
paper,  we  focus  our  evaluation  on  the  fourth  item.  We  utilize  monitors  at  our  local  IT  systems,  at 
Gmail  and  at  an  external  bank. 

•  Variability:  The  range  of  possible  outcomes  of  a  given  situation;  the  quality  of  being 
subject  to  variation. 

Attackers  arc  humans  with  insider  knowledge,  even  possibly  with  the  knowledge  that  decoys 
arc  liberally  spread  throughout  an  enterprise.  Their  task  is  to  identify  the  real  documents  from  the 
potentially  large  cache  of  decoys.  One  important  property  of  the  set  of  decoys  is  that  they  arc  not 
easily  identifiable  due  to  some  common  invariant  information  they  all  share.  A  single  search  or 
test  function  would  thus  easily  distinguish  the  real  from  the  fake.  The  decoys  thus  must  be  highly 
varied. 

Clearly,  a  good  decoy  generator  should  produce  an  unbounded  collection  of  enticing,  conspic¬ 
uous  but  distinct  and  variable  documents.  They  are  distinct  with  respect  to  string  content.  If  the 
same  sentence  appears  in  100  decoys,  one  wouldn’t  consider  such  decoys  with  repetitive  informa¬ 
tion  as  highly  variable;  the  common  invariant  sentence(s)  can  be  used  as  a  “signature”  to  find  the 
decoys,  rendering  them  distinguishable  (and  clearly,  less  enticing). 

•  Non-interference:  Something  that  does  not  hinder,  obstructs,  or  impede. 

How  might  a  decoy  interfere  with  regular  operations  of  the  legitimate  user?  One  would  expect 
that  the  more  conspicuous  a  decoy  is,  the  more  it  would  interfere  (since  it  could  be  found  more 


easily).  Conspicuous  may  help  catch  a  thief,  but  the  unwitting  user  may  be  ensnared  as  a  by¬ 
product. 

Although  we  seek  to  create  decoys  to  ensnare  an  inside  attacker,  a  legitimate  user  whose  data 
is  the  subject  of  an  attacker  must  still  be  able  to  identify  their  own  real  documents  from  the  planted 
decoys.  The  more  enticing  or  believable  a  decoy  document  may  be,  the  more  likely  it  would  be  to 
lead  the  user  to  confuse  it  with  a  legitimate  document  they  were  looking  for.  Our  goal  is  to  increase 
believability,  conspicuousness  and  enticingness  while  keeping  interference  low;  ideally  a  decoy 
should  be  completely  non-interfering.  There  arc  obvious  ways  to  measure  this  with  real  users, 
once  we  have  mechanisms  for  generating  and  distributing  large  numbers  of  decoy  documents.  The 
challenge  is  to  devise  a  simple  and  easy  to  use  scheme  for  the  user  to  easily  differentiate  their  own 
documents,  and  thus  a  measure  of  interference  is  then  possible  as  a  by-product. 

As  an  outsider,  we  presume  the  attacker  lacks  some  specific  knowledge  known  to  the  creator 
of  the  real  document,  or  the  attacker  lacks  access  to  some  “physical  key”  owned  by  the  user  who 
created  the  document.  This  crucial  property  therefore  requires  that  the  legitimate  owner  of  the 
document  be  able  to  easily  differentiate  the  real  document  they  created  from  the  bogus  generated 
to  thwart  the  attacker.  Hence,  another  important  property  is  as  follows. 

•  Differentiable:  to  mark  or  show  a  difference  in;  constitute  a  difference  that  distin¬ 
guishes;  to  develop  differential  characteristics  in;  to  cause  differentiation  of  in  the 
course  of  development. 

It  is  important  that  decoys  be  “obvious”  to  the  legitimate  user  to  avoid  interference,  but  “un- 
obvious”  to  the  insider  stealing  information.  How  might  we  easily  differentiate  a  decoy  for  the 
legitimate  user  so  that  we  maintain  “non-interference”  with  the  user’s  own  actions  and  legitimate 
work? 

One  method  we  arc  studying  employs  a  physical  solution  reminiscent  of  the  Cardano  Grille3 
(circa  1580)  that  raises  the  bar  against  insider  theft.  The  basic  concept  is  to  embed  in  each  decoy 
and  each  real  document  a  computational  object  (a  function)  that  when  executed  (say  by  a  mouse 
click)  displays  a  pattern  in  a  bounded  box.  That  pattern  can  appeal-  as  random  as  one  wishes  by 
design,  such  as  a  2D-bar  code.  A  human  would  see  no  signal  from  the  pattern. 

For  each  decoy  and  real  document,  the  display  will  vary  in  such  a  way  that  one  can  distinguish 
between  real  and  decoy  using  a  physical  uniquely  patterned  transparent  screen  overlaid  on  the 
displayed  pattern  to  reveal  a  derived  word,  picture  or  icon  (or  some  general  indicia)  that  allows  the 
user  to  discriminate  between  real  documents  and  decoys. 

This  approach  requires  the  attacker  not  only  steal  the  user’s  documents  and  files  on  their  hard 
drive,  or  in  the  shared  file  system,  but  also  the  attacker  must  steal  the  physical  overlay  pattern  from 
the  user’s  pocket. 

The  remote  thief  who  exfiltrates  all  of  a  user’s  files  onto  a  remote  hard  drive  may  be  perplexed 
by  having  hundreds  of  decoys  amidst  a  few  real  documents;  the  thief  should  not  be  able  to  easily 
differentiate  between  the  two  cases.  If  we  store  a  hundred  decoys  for  each  real  document,  the 

3The  relationship  of  this  concept  to  the  Cardano  Grille  was  suggested  by  Steve  Bellovin. 
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thief’s  task  is  daunting;  they  would  need  to  test  embedded  information  in  the  documents  to  decide 
what  is  real  and  what  isn’t,  which  should  complicate  their  end  goals.  For  clarity,  decoys  should  be 
easily  differentiable  to  the  legitimate  user,  but  not  to  the  attacker  without  significant  effort. 


4.2  The  Decoy  Document  Distributor  (D3)  System 

The  D3  web-based  service  generates  and  distributes  decoy  documents  to  registered  users.  The 
general  decoy  properties  guide  the  design  of  decoy  templates  in  D3  that  arc  used  to  generate  specific 
documents  for  download.  The  content  of  each  decoy  document  includes  several  types  of  “bait” 
information  such  as  online  banking  logins  provided  by  a  collaborating  financial  institution4,  student 
accounts  at  Columbia,  and  email  accounts  from  a  popular  service  provider,  Gmail.  These  decoy 
credentials  arc  “bait”  and  arc  enticing  targets  for  different  types  of  adversaries  [15,  14].  These 
particular  examples  of  bait  credentials  arc  monitored  internally  and  externally. 

4.3  Decoy  Document  Design 

The  primary  goal  of  the  trap  based  defense  is  to  detect  malfeasance.  Since  no  system  is  foolproof, 
we  propose  that  multiple  overlapping  signals  be  embedded  in  the  decoy  documents  to  ensure  de¬ 
tectability.  Any  alert  generated  by  the  multiple  decoys  is  an  indicator  that  some  insider  activity  has 
occurred.  Since  the  attacker  may  have  varying  levels  of  sophistication,  a  combination  of  traps  are 
used  in  decoy  documents  to  increase  the  likelihood  one  will  succeed  in  generating  an  alert.  A  so¬ 
phisticated  attacker  may,  for  example,  disable  the  internal  beacon,  or  cut  off  network  connections 
avoiding  communication,  disable  or  kill  local  host  monitoring  processes,  or  they  may  exfiltrate 
documents  via  a  web-browser  without  opening  them  locally.  The  documents  arc  designed  with 
several  means  of  detecting  their  misuse: 

•  embedded  honeytokens,  computer  login  accounts  created  that  provide  no  access  to  valuable 
resources,  and  that  arc  monitored  when  (mis)used; 

•  embedded  honeytoken  banking  login  accounts  specifically  created  and  monitored  for  this 
trap-based  technology  demonstration  specifically  to  entice  financially  motivated  attackers; 

•  a  network-level  egress  monitor  that  alerts  whenever  a  marker,  specially  planted  in  the  decoy 
document,  is  detected  (we  arc  collaborating  with  Cornell  to  use  Cayuga  [5]  for  this  purpose. 
Presently  Snort  may  be  used  as  simple  signature  detector  as  a  proof-of-concept); 

•  a  host-based  monitor  that  alerts  whenever  a  decoy  document  is  “touched”  in  the  file  system 
such  as  a  copy  operation; 

•  an  embedded  “beacon”  alerts  a  remote  server  at  a  site  at  Columbia,  that  we  call  SONAR. 
The  website  emits  an  email  to  the  registered  user  who  created  and  downloaded  the  decoy 
document.  The  implementation  of  document  beacons  is  described  in  the  next  section. 

4By  agreement,  the  institution  request  that  its  name  be  withheld. 
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4.3.1  Beacon  Implementation 

The  highly  sophisticated  attacker  will  likely  attempt  to  differentiate  between  a  real  document  and  a 
decoy  by  analyzing  the  binary  file  format  prior  to  opening  a  file.  This  necessitates  a  design  where 
beacon  code  and  watermarks  in  decoy  documents  arc  hidden  to  avoid  their  easy  identification.  The 
attacker  would  surely  avoid  the  decoys  if  they  could  easily  identify  them  by  a  simple  static  test  for 
an  embedded  beacon.  The  beacon  code  can  be  embedded  in  documents  in  a  number  of  ways  and 
made  to  appeal-  statistically  equivalent  to  its  surrounding  data  using  a  blending  technique  called 
“spectrum  shaping”  (see  [21,  6]).  Such  obfuscation  techniques  are  very  hard  to  defeat  [16]. 

Using  common  techniques  developed  for  malware,  beacons  attempt  to  silently  contact  a  cen¬ 
tralized  server  with  a  unique  token  embedded  within  the  document  at  creation  time.  The  token  is 
used  to  identify  the  decoy  and  document,  IP  address  of  the  host  accessing  the  decoy  document.  In 
addition  to  passing  the  token  and  IP  address,  some  addition  data  is  collected.  This  is  dependent 
on  the  particular  document  type,  and  the  rendering  environment  used  during  viewing  of  the  beacon 
document. 

The  first  proof-of-concept  beacons  have  been  implemented  in  Word  and  PDF  and  deployed 
through  the  D3  website. 

4.3.2  Word 

Microsoft  Word  allows  users  to  automate  tasks  by  recording  a  set  of  common  actions  that  can  be 
triggered  on  demand.  These  “Word  Macros”  and  tasks  are  encoded  and  interpreted  in  Microsoft’s 
Visual  Basic  scripting  language. 

Due  to  security  concerns,  firewalls  strictly  limit  the  ability  of  Word  to  access  the  Internet.  Be¬ 
cause  of  the  embedded  VB  engine,  Word  can  invoke  other  system  objects  from  within  a  macro 
script.  The  local  browser  can  be  invoked  from  within  a  Word  macro,  bypassing  the  firewall.  In¬ 
formation  such  as  local  machine  directories,  user’s  credentials,  and  the  machine’s  IP  address  can 
all  be  encoded  and  passed  through  the  firewall  by  the  local  browser  agent.  As  long  as  the  docu¬ 
ment  is  digitally  signed.  Word  will  allow  some  level  of  macro  activity  on  the  host.  The  macros  are 
automatically  triggered  upon  opening  the  document. 

An  alternative  method  which  does  not  require  macro  support  is  suggested  by  [22] .  A  remote 
image  is  embedded  in  the  decoy  document  and  rendered  by  Word’s  document  browser  when  the 
user  views  the  document.  The  D3  website  supports  this  feature  by  intercepting  image  requests  and 
parsing  out  stealthy  tokens  embedded  in  the  image  request. 

4.3.3  Adobe  PDF 

PDF  is  an  open  standard  published  by  the  ISO  and  is  supported  on  most  platforms  and  configu¬ 
rations.  In  the  latest  version,  Adobe  has  embedded  a  Javascript  interpreter  in  the  application  to 
be  able  to  verify  form  data  as  the  user  enters  them  in.  We  leverage  this  feature  to  issue  a  data  re¬ 
quest  upon  the  initial  opening  of  the  document  through  some  Javascript  code.  The  beacon  contains 
the  token  to  identify  the  document  so  that  the  system  can  track  individual  documents  as  they  are 
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read  across  different  systems.  Due  to  security  concerns,  the  latest  releases  of  Adobe  Reader  now 
prompt  the  user  for  permission  to  contact  a  remote  server.  On  the  users  own  host,  this  action  can 
be  “memorized”  so  that  subsequent  requests  do  not  issue  warnings.  Earlier  versions  of  the  Adobe 
Reader  do  not  show  an  alert,  allowing  them  to  silently  contact  the  SONAR  server  also  on  remote 
systems.  Not  all  readers  support  the  Javascript  PDF  so  this  particular  beacon  is  limited  on  those 
systems  where  the  default  reader  is  not  Adobe. 

The  D3  site  includes  a  tutorial  guiding  the  user  on  how  to  generate,  download,  and  open  a 
newly  generated  decoy  document  to  “memorize”  beacon  triggers  to  allow  silent  communication  on 
the  host. 

4.3.4  Embedded  Marker  implementation 

Beacon  documents  contain  embedded  markers  that  a  host  or  network  sensor  may  detect  either  when 
documents  arc  loaded  in  memory  or  egressed  in  the  open.  For  the  initial  proof-of-concept  system 
the  markers  are  constructed  as  MD5  signatures  from  a  set  of  keywords.  The  markers  are  placed 
in  either  the  beacon  document’s  meta-data  area  or  embedded  as  a  comment  within  the  document 
format  structure.  Both  locations  arc  ideal  for  embedding  stealthy  markers  since  most  rendering  pro¬ 
grams  ignore  these  parts  of  the  document.  The  embedded  markers  can  be  used  in  Snort  signatures 
for  detecting  exfiltration. 

5  Experiments  using  Decoy  Documents  as  Bait 

We  have  defined  the  general  properties  that  decoys  should  have  and  discussed  how  we  may  measure 
these  properties,  but  here  we  focus  on  the  most  important  property:  detectability .  Under  ideal 
testing  conditions,  decoy  efficacy  could  be  shown  through  deployment  on  true  operational  systems 
either  within  an  enterprise  environment,  or  on  personal  computers,  by  the  number  of  attacks  they 
arc  able  to  detect  or  thwart  (they  have  a  deterrence  effect).  However,  given  reasonable  time  limits, 
the  infrequency  of  attacks  within  the  insider  threat  model  makes  this  approach  impractical  within 
a  university  environment.  As  we  mentioned  we  are  now  seeking  a  larger  user  population  to  study 
and  measure  decoy  generation  over  time. 

Another  approach  to  evaluation  is  a  user  study  in  which  users  arc  organized  and  asked  to  evalu¬ 
ate  decoys  based  on  each  of  the  key  decoy  properties  mentioned  earlier.  We  take  human  evaluation 
to  be  the  gold  standard  of  evaluation  since  the  human  mind  is  the  ultimate  target  of  our  decoys. 
That  is,  we  wish  to  show  how  well  our  decoys  can  induce  deception  on  human  test  subjects.  One 
of  the  challenges  of  conducting  a  traditional  user  study  lies  in  the  logistics  of  obtaining  volunteers. 
In  our  methodology,  we  attempt  to  reduce  this  challenge  by  leveraging  external  attackers  to  serve 
as  participants  in  our  study.  To  do  so,  we  “invite”  attackers  (or  more  accurately,  bamboozle  them) 
into  our  study  by  attracting  them  with  a  set  of  vulnerable  systems  on  the  university  network,  which 
also  serve  as  our  testing  platform. 

Our  test  platform  is  embedded  within  a  honeynet  [9].  It  consists  of  several  virtual  machines 
running  Finux  and  configured  with  Sebek  [10]  to  capture  attacker  activities  including  commands 
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and  file  references.  In  order  to  limit  potential  damage  from  system  compromise  and  still  allow  for 
testing,  we  configured  the  honeynet  to  allow  all  incoming  connections  while  restricting  the  number 
of  outgoing  connections. 

The  virtual  machine  hosts  within  the  honeynet  were  configured  with  accounts  and  home  direc¬ 
tories  for  three  decoy  usernames.  To  make  the  environment  as  real  as  possible,  genuine  data  from 
personal  accounts  on  other  systems  were  loaded  into  each  of  the  home  directories.  We  changed 
name  references  within  the  data  to  reflect  those  of  the  appropriate  decoy  users.  In  total,  our  phony 
user  accounts  contained  15  or  more  directories  and  50-100  files.  The  hosts  were  then  seeded  with 
several  of  D3’s  decoy  files  using  the  decoy  distributor  utility.  The  decoy  files  were  generated  to 
have  conspicuous  names  such  as  “stolen  passwords”,  “credit  card”,  “private  data”,  and  “Gmail 
Accountlnfo”,  but  were  distributed  within  the  polluted  home  directories  of  the  decoy  accounts, 
making  the  environment  as  real  as  possible. 

To  lure  test  subjects  into  the  study,  our  initial  approach  was  to  use  attackers  that  attempt  to  gain 
internal  access  via  password  scanning.  Password  scanning  attacks  arc  common  on  the  university 
network,  where  attempts  on  a  typical  machine  arc  in  the  range  of  thousands  per  day.  To  enable 
attacker  access,  we  conducted  a  short  study  to  first  determine  the  most  common  usernames  and 
passwords  (excluding  those  for  root  and  actual  users)  used  in  these  attempts.  We  created  accounts 
with  several  of  these  usernames  and  passwords,  to  quickly  learn  that  this  breed  of  attacker  was 
not  going  to  suffice  for  our  user  study;  their  sole  purpose  seemed  confined  to  creating  zombies  for 
botnets.  While  this  may  be  a  valid  threat  to  study  while  evaluating  decoys  [7],  allowing  hots  to 
operate  on  the  university  network  poses  too  much  risk. 

In  our  second  and  more  aggressive  approach,  we  narrowed  our  recruitment  effort  to  web  forums 
and  IRC  channels  with  the  expectation  and  hope  that  we  would  get  fewer  attacks  involving  botnets. 
In  this  approach,  we  selected  several  high  volume  forums  to  solicit  volunteers  and  posted  variations 
of  invitations  with  messages  that  included  hostnames,  usernames,  and  passwords.  The  idea  was  to 
provide  just  enough  innocent-looking  information  from  a  novice  to  lure  people  into  our  machines 
without  providing  direct  evidence  that  we  were  conducting  a  deception-based  experiment.  Note 
that  we  deliberately  omit  the  names  of  the  forums  used  and  the  exact  details  of  the  messages,  as 
this  is  an  ongoing  study. 

While  our  methodology  could,  in  theory,  provide  anyone  with  access  to  our  test  platform,  by 
selectively  choosing  the  location  of  postings  and  contents  postings,  we  expected  to  recruit  two 
primary  classes  of  individuals: 

•  Legitimate  and  generally  curious  computer-savvy  individuals.  These  users  have  no  interest 
in  extending  privileges  in  an  unauthorized  way,  but  participate  in  the  study  out  of  curiosity, 
as  there  is  no  other  incentive. 

•  Unscrupulous  opportunistic  hackers  who  attempt  to  extend  their  network  access  by  what¬ 
ever  means  afforded  to  them.  These  individuals  arc  enticed  by  our  posting  as  they  see  our 
machines  as  low  “hanging  fruit”  in  their  targeting  campaign. 

In  either  case,  we  believe  these  individuals  to  be  suitable  candidates  for  our  study  (with  one  caveat 
mentioned  later).  Both  classes  of  individuals  can  be  used  in  measuring  the  enticement  property  of 
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decoys.  We  measure  this  by  examining  the  behavior  exhibited  in  tile  access,  both  with  respect  to 
the  particular'  files  a  user  attempts  to  read  and  in  the  order  in  which  the  files  are  read.  For  example, 
if  all  users  consistently  read  the  same  file  first,  we  know  the  file  must  indeed  be  enticing. 

In  regards  to  indistinguishability  of  the  decoys,  we  note  that  the  content  of  these  decoys  con¬ 
tains  bait  information  in  the  form  of  monitored  credentials  on  real  systems.  Certainly,  if  our  at¬ 
tackers  take  the  time  to  use  the  decoy  credentials,  there  is  an  implication  that  they  must  also  be 
believable.  More  importantly  though,  if  they  use  the  credentials  and  we  detect  their  use,  we  have 
also  answered  the  most  important  question  of  -  can  we  detect  the  attacker?  Note  that  the  first  class 
of  the  individuals  is  by  definition,  not  useful  for  this  part  of  the  study.  That  is,  attempting  to  use 
credentials  found  on  our  machines  is  clearly  an  illicit  activity,  which  they  would  not  partake  in. 
Unfortunately,  we  do  not  have  a  good  way  of  distinguishing  the  two  classes,  so  we  get  slightly 
skewed  results. 

Over  the  span  of  the  first  week  5,  our  hosts  netted  20  unique  users,  determined  anecdotally  by 
source  IP  address  and  reinforced  by  the  unique  behaviors  exhibited  by  each.  The  length  of  user 
sessions  ranged  from  minutes  to  hours.  The  users  that  spent  hours  were  focused  on  activities  that 
included  writing  code  and  attempting  to  install  other  software,  presumably  with  malicious  intent. 
The  most  common  activities  exhibited  for  users  upon  login  included  activities  such  as:  determining 
who  else  was  on  the  machine,  checking  and  deleting  the  command  history,  and  exploring  the  system 
(i.e.,  reading  the  most  conspicuous  files).  We  note  that  in  almost  all  cases,  our  decoys  were  among 
the  first  few  files  viewed  by  users  upon  initial  login  (as  noted,  viewing  the  history  file  was  also 
popular).  The  most  significant  observations  were  made  for  6  of  the  20  unique  users: 

1 .  There  were  three  attempts  to  use  Gmail  credentials  that  were  contained  in  a  decoy  document, 
which  triggered  an  alert  on  SONAR. 

2.  One  person  succeeded  in  changing  the  password  on  a  bogus  Gmail  account,  which  also 
triggered  a  SONAR  alert. 

3.  There  were  at  least  two  attempts  to  exfiltrate  decoy  files  with  Secure  Copy  and  Secure  FTP, 
one  named  “stolenpasswords”  that  contained  credentials  to  the  university  systems. 

4.  There  was  one  attempt  to  use  the  credentials  contained  in  the  “stolenpassword”  file,  which 
we  were  alerted  to  by  the  monitor  that  signaled  an  alert  to  SONAR. 

We  take  these  results  as  evidence  that  D3/SONAR  indeed  has  value  as  an  insider  defense  system. 
While  only  5  of  the  20  users  sounded  an  alarm  on  SONAR,  we  emphasize  that  our  methodology 
did  include  an  unknown  proportion  of  benign  users. 

One  flaw  of  our  methodology  that  was  revealed  during  testing  was  that  we  allowed  users  to 
make  changes  to  the  file  system.  We  did  this  deliberately  to  increase  the  realism  of  the  environment 
in  the  experiments.  The  problem  this  created  was  that  it  made  decoy  defense  vulnerable  to  deletion 
(e.g.,  several  of  our  visitors  executed  wholesale  deletion  of  files  with  “mi  -rf  *”)  .  This  poses  a 

5 We  note  that  most  attacks  occurred  within  the  first  8  hour  period  after  posting. 
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problem  in  our  testing  methodology,  but  not  necessarily  in  practice.  That  is,  the  act  of  deleting  files 
is  in  itself  a  detectable  behavior  that  would  alert  monitors  of  suspicious  behavior. 

In  this  study,  we  assumed  a  level  of  sophistication  great  enough  to  surpass  the  “beacon”  embed¬ 
ded  decoys,  and  therefore,  omitted  them.  Their  exclusion  was  prudent  as  they  represent  a  different 
class  (i.e.,  designed  for  a  less  sophisticated  threat  model)  of  decoy  than  the  documents  containing 
credentials  due  to  their  distinguishability  (i.e.  they  emit  pings  to  a  central  server).  We  did  not 
want  to  chance  revealing  the  nature  of  the  study  to  participants  who  were  assumed  to  be  highly 
sophisticated  and  would  notice  pings.  We  believe  the  value  of  decoy  documents  to  be  self-evident. 
We  leave  the  reader  with  a  link  to  the  D3  site  to  try  it  and  become  part  of  our  planned  longitudinal 
study. 

6  Conclusion 

Our  work  focuses  on  the  study  and  creation  of  bait  information  with  the  aim  of  exposing  or  thwart¬ 
ing  the  exploitation  of  cx filtrated  information.  Although  the  use  of  bait  information  and  similar 
trap-based  defenses  is  well  known,  most  of  those  efforts  have  focused  either  on  artifacts  that  are 
logically  separate  from  the  operational  systems  (e.g.,  honeypots  [23])  or  on  low-level  snippets  of 
information  created  manually  (e.g.,  fake  database  records  [24]).  The  D3  system  is  a  scalable  and 
automated  trap-based  defensive  system  that  forces  attackers  to  expend  considerable  effort  to  iden¬ 
tify  realistic  useful  information  from  purposely  planted  bogus  information  intended  to  deceive. 
Naturally,  the  probability  of  exposing  a  malicious  insider  with  trap-based  defense  tactics  increases 
with  the  amount  of  decoy  information  that  is  generated  and  disseminated.  D3  offers  the  novel  ser¬ 
vice  of  automatically  creating  and  managing  decoy  documents,  enabling  the  throttling  of  bait  based 
on  the  desired  protection  level  or  cost  (e.g.,  interference)  one  is  willing  to  pay. 
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A  Sample  D3  Documents 


AOL  Read  http ://wcbmai I .ao  1  .com3 857 5/aoI  en- us, 'Mai  1/Di sp  1  ayMcssage.asp  x| 


Keep  as  New  Reply  Forward  IM  Action  Delete  Spam 

While  I  am  away.... 

From:  Frank  Secola  <fsecola@gmail.com>  Hide 
To:  terryg@aol.com 

Date:  Tue,  1 6  Sep  2008  12:11  pm 

Terry. 

ril  be  on  vacation  for  the  next  6  weeks.  Please  check  my  email  and  keep  me  apprised  of  anything  critical  while  I 
am  gone. 

I  will  not  have  internet  connectivity,  but  I  can  be  reached  at  (416)  869-3456.  If  you  need  to  make  any  purchases, 
please  use  the  credit  card  info  below. 

Thanks. 

Frank 


Gmail  username:  fsecola 
Gmail  Password:  wxyzl  234 

ATM  PIN:  3993 

Credit  Card:  4532681078425093 

CW:  174 

Exp.  Date:  09/201 1 

Mother's  Maiden  Name:  Sheridan 

Birth  date:  03/09/1982 

Name:  Frank  Secola 

Address:  60  E  Rio  Salado.  Apt#4 

City:  Tempe 

State:  AZ 

Zip:  85281 

Tel:  480-682-5100 

NOTICE:  This  e-mail  is  intended  solely  for  the  use  of  the  individual(s)  to  whom  it  is  addressed.  If  you  believe  you 
received  this  e-mail  in  error,  please  notify  the  sender  immediately,  delete  the  e-mail  from  your  computer  and  do 
not  copy  or  disclose  it  to  anyone  else. 


Print 

Add  to:  Blog,  To  Do,  C 


Figure  1 :  Decoy  sample  email  message  with  embedded  gmail  account  information. 
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11040 

- 7 

Label 

(See 

instructions 
on  page  12.) 

Use  the  IRS 
label. 

Otherwise, 
please  print 
or  type. 

Presidential  V. 
Election  Campaign 


Department  of  the  Treasury — Internal  Revenue  Service 

U.S.  Individual  Income  Tax  Return 


07 


For  the  year  Jar.  1-Dec.  31 . 2007,  a  other  tax  year  beginning 

.  2007.  erxSng 

.20 

OMB  No.  1545-0074 

Your  first  name  and  initial 

Mark 

Last  name 

Myers 

Your  social  security  number 

383  i  30  j  7790 

If  a  joint  return,  spouse's  first  name  and  initial 

Last  name 

Spouse's  social  security  number 

Home  address  (number  and  street).  If  you  have  a  P.O.  box.  se 

519  Tully  Street 

page  12. 

Apt  no. 

»  You  must  enter  » 

A  your  SSN(s)  above.  A 

City,  town  or  post  office,  state,  and  ZIP  code.  If  you  have  a  foreign  address,  see  page  12. 

Westland.MI  48185 _ 


Checking  a  box  below  will  not 
change  your  tax  or  refund. 

►  Check  here  if  you,  or  your  spouse  if  filing  jointly,  want  $3  to  go  to  this  fund  (see  page  12)  ►  D  You  CD  Spouse 


J. 


Filing  Status 

Check  only 
one  box. 


1  [2  Single  4 

2  □  Married  filing  jointly  (even  if  only  one  had  income) 

3  CD  Married  filing  separately.  Enter  spouse's  SSN  above 

and  full  name  here.  ►  5 


CD  Head  of  household  fo’lth  qualifying  person).  (See  page  13.)  If 
the  qualifying  person  is  a  child  but  not  your  dependent  enter 

this  child's  name  here.  ►  _ 

CD  Qualifying  widowjer)  with  dependent  child  (see  page  14) 


Boxes  checked 
on  6a  and  6b 
No.  of  children 
on  6c  who: 

•  lived  with  you  _ 

•  did  not  live  with 
you  due  to  divorce 
or  separation 

(see  page  16)  _ 

Dependents  on  6c 
not  entered  above- 

Add  numbers 


Exemptions 


If  more  than  four 
dependents,  see 
page  15. 


0  Yourself.  If  someone  can  claim  you  as  a  dependent,  do  not  check  box  6a 


Dependents: 

(1)  First  name  Last  name 

(2)  Dependent's 
social  security  number 

(3)  Dependent's 
relationship  to 

IW*  qualifying 
chad  for  child  tu 
credit  (see  page  15) 

□ 

□ 

□ 

□ 

d  Total  number  of  exemptions  claimed 


7 

61742 

8a 

9a 

W-2  here.  Also  9a  Ordinary  dividends.  Attach  Schedule  B  If  required . 

Figure  2:  Decoy  tax  document  with  bogus  user  information. 


h©lf 


.com" 


Darts 

DMI  Soorts  Bristle  DartBoard  with  Solid  Wood  Cabinet 

Price:  $69.99 

Corningware 

CorninoWare  French  White  12-Piece  Bake  and  Serve  Set 

Price:  $46.14 

Microwave 

GE  0.7  cu.  ft.  Capacity  Countertop  Microwave  Oven,  JES738WJ 

Price:  $49.77 

AA 

Duracell  Batteries.  AA  Size,  16 

Price:  $21.40 

Sub:  S187.30 
Shipping:  $13.67 
Total:  $200.97 

ESTHER 

John  Hintz 

Visa:  4929992251203640 

2425  Rosewood  Court 

Expires:  4/2012 

Garv.MN  57237 

Email:  John.C. Hintz@dodqit.com 

Place  my  order! 


Figure  3:  Decoy  eBay  receipt. 
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